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A method for performing parametric image alignment, which 
when given any local match-measure, applies global estimation directly 
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1Q2 



\ v 



IMAGE PROCESSOR 



CPU 
106 



MEMORY 



(MAGE 
REGISTRATION 
ROUTINE 



J 110 



SUPPORT 
CIRCUITS 
112 



116 



-106 



10 

PERIPHERALS 



-114 



SDOCID: <WO 985083 5A2_I_> 




FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


zw 


Zimbabwe 


CI 


C6te d'lvoire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






cu 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







ISDOCID: <WO 9850885A2J_> 



WO 98/50885 1 





PCT/US98/09021 



METHOD AND APPARATUS FOR PERFORMING GLOBAL IMAGE 
ALIGNMENT USING ANY LOCAL MATCH MEASURE 

This application claims benefit of U.S. provisional patent 
5 application serial number 60/046,069, filed May 9, 1997 and incorporated 
herein by reference. 

The present invention generally relates to image processing 
systems and, more particularly, to a method and apparatus for aligning 
10 images within a image processing system. 



In many image processing systems it is necessary that images be 
aligned with one another to perform image merging or analysis. The 

15 phrase image processing , as used herein, is intended to encompass the 
processing of all forms of images including temporally unrelated images 
as well as images (frames) of a video signal, i.e., a sequence of temporally 
related images. Image alignment in an image processing system is 
necessary to create mosaics of multiple images, perform some forms of 

20 image compression, perform motion estimation and/or tracking and the 
like. Alignment (also known as registration) of images begins with 
determining a displacement field that represents the offset between the 
images and then warping one image to the other to remove or minimize 
the offset. The images may be taken from the same sensor or from two 

25 entirely different sensors, possibly of different modality (e.g., an infrared 
sensor and a visible sensor). Often, the displacement field that defines the 
offset between the images can be described as a global parametric 
transformation between the two images, such as an affine, quadratic, or a 
projective transformation. Many techniques have been developed for the 

30 parametric transformation of a pair of images. 

Most flow-based techniques divide the registration process into two 
steps: first a flow-field is estimated, then, using regression, the global 
parametric transformation which best describes the flow field is found. 
However, often the local flow estimates are noisy and unreliable, resulting 

35 in poor registration accuracy and a lack of robustness. 



BACKGROUND OF THE DISCLOSURE 
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To overcome this problem, direct gradient-based techniques have 
been developed. These techniques estimate the global transformation 
parameters by directly using local image intensity information without 
first computing a local flow-field. They achieve high registration accuracy 
5 since they avoid the noisy step of computing a local flow-estimation. 
However, these techniques assume that the intensity values of 
corresponding pixels in the two images are the same, which is known as 
the "brightness constancy assumption". As a result, the applicability of 
direct gradient-based techniques is limited to situations when the images 

10 to be registered are substantially similar in appearance. Consequently, 
the direct gradient-based techniques cannot handle large changes in 
illumination and contrast between the images. Because the images need 
to be substantially similar to be registered using direct gradient-based 
techniques, images produced by sensors having different modality and/or 

15 containing a substantial range of motion cannot be accurately registered 
by direct gradient-based techniques. 

Therefore, a need exists in the art for a method and apparatus that 
aligns images having substantial illumination differences between the 
images and/or a substantial amount of motion and/or other image 

20 differences that would otherwise make registration difficult. 

SUMMARY OF THE INVENTION 
The disadvantages of the prior art are overcome by the present 
invention of a method, which when given any local match measure, 
25 applies global estimation directly to the local match measure data, without 
first performing an intermediate step of local flow estimation. Any local 
match measure can be used as part of the inventive method, such as 
correlation, normalized-correlation, squared or absolute brightness 
difference, statistical measures such as mutual information, and the like. 
30 Global estimation constrains the analysis of the local match measure, 
thereby avoiding noisy local motion estimates, while still providing an 
accurate result for image registration. 

In one embodiment of the invention, the inventive generalized global 
alignment method is used with a normalized-correlation match-measure 
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to result in a global correlation-based alignment method which combines 
the robustness and accuracy of global alignment with the broad 
applicability of the normalized-correlation match measure. The inventive 
method overcomes many of the limitations of existing gradient-based and 
5 flow-based techniques. In particular, the novel method can handle large 
appearance differences between the images and large image motion 
within the image scene. Also, in contrast to the flow-based methods, the 
invention accurately registers imagery with sparse texture or feature 
content. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 
The teachings of the present invention can be readily understood by 
considering the following detailed description in conjunction with the 
accompanying drawings, in which: _ 
15 FIG. 1 depicts a block diagram of an image processing system of the 

present invention; 

FIG. 2 depicts a flow diagram representing the method of the 
present invention; and 

FIG. 3 depicts a flow diagram representing a second embodiment of 
20 the method of the present invention. 

To facilitate understanding, identical reference numerals have been 
used, where possible, to designate identical elements that are common to 
the figures. 

25 DETAILED DESCRIPTION 

FIG. 1 depicts a block diagram of an image processing system 100 

comprising at least one image sensor 104 and an image processor 106. 

The present invention can be used to process images produced by multiple 

sensors (e.g., sensor 104 and 104A) and is especially useful in processing 
30 images that are produced by sensors having different modalities, e.g., an 

infrared wavelength sensor 104A and a visible wavelength sensor 104. . 

The sensors 104 and 104A are intended to provide imagery of a three 

dimensional scene 102. 
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To process these images, the image processor 106 comprises a 
central processing unit (CPU) 108, a memory device 110, conventional 
CPU support circuits 112 and input/output (I/O) peripherals 114. The CPU 
108 is a general purpose computer that, when executing specific routines 
5 that are recalled form memory 110, becomes a specific purpose computer. 
The CPU can be any high-performance processor such as a PENTIUM II 
processor manufactured by Intel Corporation or a POWER PC processor 
manufactured by Motorola Inc. The memory 110 can be random access 
memory (RAM), read only memory (ROM), a hard disk drive, a floppy disk 

10 drive or any combination thereof. The support circuits 112 include various 
conventional circuits such as frame grabber circuits, analog-to-digital 
(A/D) circuits, clock circuits, cache, power supplies, and the like. The I/O 
peripherals 114 generally include a keyboard, a mouse, and a display, but 
may also include a video tape recorder, a video disk player, and the like. 

15 The images that are processed by the image processor 106 may not be 
sourced by the sensors 104, but may also be sourced from pre-recorded 
material such as would be provided by a video tape recorder or other image 
storage device. 

The present invention is a routine 116 that, when executed by the 
20 CPU 108, provides a method, which when given any local match measure, 
applies global estimation directly to the local match measure data, without 
first performing an intermediate step of local flow estimation. Any local 
match measure can be used as part of the inventive method, such as 
correlation, normalized-correlation, squared or absolute brightness 
25 difference, statistical measures such as mutual information, and the like. 
Global estimation constrains the analysis of the local match measure, 
thereby avoiding noisy local motion estimates, while still providing an 
accurate result for image registration. 

More specifically, when a pair of images representing a three 
30 dimensional scene 102 are to be aligned and merged, i.e., the images are 
to be registered, the induced image motion between two images depends 
on the cameras' internal parameters such as zoom and external 
parameters such as actual camera motion, as well as parallax motion 
within the 3D scene structure. In many practical situations, the induced 
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motion field between the image pair can be modeled in terms of one or a 
small number of parametric transformations. Such transformations are 
well known in the art as represented in J.R. Bergen et al., "Hierarchical 
Model-based Motion Estimation," European Conference on Computer 

5 Vision, pp. 237-252, Santa Margarita Ligure, May 1992. The present 

invention expands upon this parametric transformation model by using a 
two-dimensional parametric transformation technique, although the 
approach generalizes to other classes of models as well. 

Specifically, when the motion field is a linear function of a few 

10 unknown parameters {p^, then the motion vector u(x,y) = (u(x,;y),v(*,;y)) r 
can be expressed as: 

u(x,y 9 p) = X(x 9 y)-p (1) 
where X(x,y) is a matrix which depends only on the pixel coordinates (x,y), 
and p = {p v .-.iP n ) T is the parameter vector. For example, for an affine f 

15 transformation: 

wU, y\ p)1 _[ ft + P2 X + p*y' 

therefore, in this case: p = {p x , p 2 , p 3 , p 4 , p 5 , p 6 ) T and 

"1 x y 0 0 0" 

X = 

[0 0 0 1 x y 
and for a quadratic transformation: 

p l +p 2 x + p 3 y + p 1 x 2 + p B xy 

therefore: p = {p v p 2 ,p^P^P v P^P 1 yP%) T and 

_[l x y 0 0 0 x 1 xy 
0 0 0 1 x y xy y 2 _ 

Given two images f and g, the local match measure m(u,v;x,y) 
computes the similarity (or dissimilarity) between the pixels f(x,y) and 
25 g(x+u, y+v). Some of the common match measures used in existing 
registration techniques include: (i) correlation, (ii) 
normalized-correlation, which is generally preferred over regular 
correlation, since it is invariant to local changes in mean and contrast, 
(iii) squared brightness differences, which is applicable under the 



20 



~u(x,y,p) 




y{x,y;p) 
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brightness constancy assumption, (iv) sum of squared brightness 
differences (SSD), and (v) mutual information which measures the 
statistical correlation between the two signals. 

The present invention performs parametric registration of two 

5 images, / and g, by determining a parametric transformation ~p that 
maximizes (or minimizes) a global match-measure M(p): 

M(p) = ^m(u(x,y;p) t v(x,y;py,x t y) (2) 

-> -> 

where (u(x,y;p)),v(x y y\p)) is the motion field described by the parametric 

transformation p, and m(u,v;x,y) measures the local match between 
10 pixel (x,y) in f and pixel -(x+u, y+v) in g. For example, when a correlation 
match-measure is used, then: 

def 

m(u,v;x f y) = 2Lf(x + Uy + j)g(x + u + Uy + v + ;); 

UJ)eW 

when a squared-brightness-difference measure is used, then: 

def 

m(u, v;x, y) = (/(x) - g(x + w, y + v)) 2 

15 The invention determines a global transformation p that 

maximizes (or minimizes) the sum of local match-measures m . The 

routine of the present invention estimates p via regression directly on 
surfaces of the local match-measure m, without first committing locally to 
specific image displacements {(w(jc,y),v(x,y))}. 
20 The inventive routine, first defines a match-measure surface: Let 

S (x ' y) (M,v) denote a local match-measure surface corresponding to pixel 
(x,y) in f. For any shift (u,v) of g relative to f t S ix ' y) is defined as: 

S ixy) (u,v) = m(u,v;x,y) 
For example, when the local match-measure m(u,v;x,y) is derived 
25 using a correlation technique, then S ix,y) is the correlation surface of pixel 
'(x,y). Equation (2) can therefore be rewritten in terms of the collection of 
surfaces {S Uo0 }: 



1DOCID: <WO 985068 5A2J_: 



# 

M(p)^^S {x ' y \u(x^p)Mx,y\p)) 



WO 98/50885 ~ W PCT/US98/09021 

-7- 



^ (3) 
= £S ( ^ ) (2(x 1 y;p) 

For compactness of notation, u denotes the two-dimensional vector (u,v). 
To solve for p that maximizes M(p), the routine uses Newton's method, 

which is an iterative method. Let p Q denote the parametric 

5 transformation that was computed in the previous iteration step. A 

— * — > 

second order Taylor expansion of M{p) around p Q yields: 

M(p) = M(p 0 ) + (V.M(p 0 )) T S p +8 p T H M (p 0 )5 p (4) 

— — * 

where, 8 p = p- p 0 is the unknown refinement step of p 0 that is to be solved 
for, V p M denotes the gradient of M (i.e., first derivatives of M), and H M 
10 denotes the Hessian of M (i.e., second derivatives of M): 



V p M(p) = 



<**f(dM dM dM V 



\dp x * dp 2 * *dp nJ 



In order to find the parametric transformation p that maximizes M(/?), 

— ► 

the right-hand side of Equation (4) is differentiated with respect to 8 P and 
15 set equal to zero: 0 = V p M(p Q ) + H M (p 0 )8 p . Therefore: 

8 p * =HH M (p 0 )T'-V p M(p 0 ) (5) 

where 8 p denotes 8 P which maximizes the quadratic approximation of 

M{p) around p Q (Equation (4). Hence, given p 0 from the previous 

iteration, the system estimates a refinement step 8 p to obtain a better 

20 estimate 8 p =p-8* p . The new p is then used as an initial estimate p Q for 

the next iteration. 

Expressing the right-hand-side of Equation 5 in terms of the 

measurable quantities {S (x,y) } 
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^ (6) 

When AT is the matrix defined in Equation (1) (i.e., u(x,y;p) = X(x,y)-p); 
V u S(«)is the gradient of S u - y) u(x,y; p), and H s is the Hessian of 
S (xy) u(x,y;p), then Equation 6 becomes: 

V u 5(«)= 

\ ou ov ) 

5 

H s (u) = 

Substituting Equation (6) into Equation (5) provides an expression for the 

-» » - ■ 

refinement step 8 p in terms of the local match-measure surfaces {S u>) }. 

<C =-(X ( ^ *^("o)^)-' -V H S(5 0 )) (7) ' 

where u 0 = M(*,y;p 0 ) is th © displacement induced at pixel (x,y) by the 

10 estimated parametric transformation p Q at the previous iteration. The 
process represented by Equation (7) functions as follows: at each iteration, 
for each pixel (x,y), the local quadratic approximation of the pixel's 
match-measure surface S ix,y) (u) is computed around its previously 
estimated displacement u 0 . These local quadratic approximations are 

15 then used in Equation (7) to solve for the global parametric refinement 
— ► * 

To account for large misalignments between the two images, the 
system performs multi-resolution coarse-to-fine estimation, e.g., as in 
J.R. Bergen et al., "Hierarchical Model-based Motion Estimation," 

20 European Conference on Computer Vision, pp. 237-252, Santa Margarita 
Ligure, May 1992. To facilitate a coarse-to-fine alignment process, a 
Gaussian (or a Laplacian) pyramid is constructed for each of the two 
images. Although Gaussian and/or Laplacian pyramids are generally 
used, other forms of image decomposition may be appropriate such as the 

25 use of wavelets and the like. As such, the regressive process described 



d 2 S ix ' y) d 2 S ix ' yr 
du 2 dudv 

d 2 S U.y) d 2 S tx t y) 



_ dudv dv 2 
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above is used at each level of the pyramids to accurately process the 
images. 

FIG. 2 depicts a flow diagram of a routine 116 representing the 
method of the present invention as executed by the CPU of FIG. 1. The 
5 routine begins at step 200 and proceeds to step 202 where the pyramid 
processing is performed to decompose a pair of images f and g into a 
plurality of pyramid levels. Images fandg are supplied by one or more 
sensors, as described above, and digitally stored in memory as arrays of 
pixel values. 

10 In the following description, f { and g l denote the images at 

resolution level / in the pyramids of images fandg, respectively. At step 
204, the routine selects the coarsest level of the pyramids of each image. 

Starting at the selected coarsest resolution level with p Q initially set to 0, 
the following steps are performed at each resolution level of the pyramids. 
15 At step 206, for each pixel (x,y) at f t compute a local match measure 

surface around 5 0 (i.e., around the displacement estimated at the previous 
iteration: u 0 = u(x,y\p 0 ) = Xp 0 ). In practice, the match-measure surface is 

estimated only for displacements u of g t within a radius d around w 0 , i.e.: 
S f Uj0 (2) = m(w, v;x,y). Vw = (u, v)j.r.||5 - v 0 \\<d 

20 where the distance d of possible u from w 0 is determined by the size of the 
masks used for discretely estimating the first and second order derivatives 
of S {x,y \u) at w 0 . Although other masking functions can be used, in the 
present embodiment of the invention, Beaudet's masks were used to 
estimate the first and second order derivatives of the surfaces. Beaudet's 

25 masks are described in detail in Paul R. Beaudet, "Rotationally Invariant 
Image Operators," International Conference on Pattern Recognition, pp. 
579-583 (1978). Satisfactory results have been found when using 3x3 
masks (i.e., d - 1) or with 5x5 masks (i.e., d = 2). 

At step 208, the routine performs the regression step of Equation. (7) 

— > * 

30 to determine the parametric refinement 8 p . 
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At step 210, the parametric refinement is used to update p Q : p Q := 

Po + o p . To refine the computational accuracy, the routine returns to 
step 206. The loop represented by steps 206, 208 and 210 is repeated (step 
218) until a predefined parametric transform accuracy is achieved for a 
5 given pyramid level. The predefined parametric transform accuracy 
varies from application to application and generally depends up the image 
resolution, the processing power of the computer performing the 
calculations, the complexity of the images being aligned and the like. 
After repeating the process loop of steps 206, 208, and 210 for a few 

10 iterations (typically four), the parameters ~p are propagated to the next 
resolution level, and the process is repeated at that resolution level. 

The routine then queries, at step 212 whether another pyramid level 
is to be processed. If the query is affirmatively answered, the routine 
selects, at step 214, another level of the pyramid, e.g., the next finest level, 

15 and returns to step 208. Otherwise, the routine ends at step 216. The 
process is generally stopped when the iterative process at the highest 
resolution level is completed. 

In practice, to improve performance, an image warping step 220 
can be added before each iteration. An image warping process that would 

20 be useful in this step is disclosed in the Bergen et al. paper cited above. To 
implement image warping, the image g (an inspection image) is warped 
towards the image / (the reference image) according to the current 

estimated parametric transformation p Q . After warping the images, p Q 

-* * 

is again set to 0, and 8 p is estimated between the two warped images. 

25 Warping compensates for the spatial distortions between the two images 
(e.g., scale difference, rotations, etc.), and hence improves the quality of 
the local match-measures which are generally based on a window around 
a pixel, such as in correlation. 

To further condition the regression step of Equation (7) and make 

30 the process more robust, only pixels (x,y), where a quadratic 

approximation of S ix,y) (w) around u 0 is concave, are used in the regression 
process, i.e., only concave surfaces are selected for use in the regression 
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and all others are ignored. If M(p) is minimized rather than maximized 
— » 

with respect to p, then only convex (rather than concave) pixels are used. 
All other pixels are ignored. This automatic outlier rejection mechanism 
provides the routine with a strong locking property. The locking property 
5 occurs because of the well-known phenomenon that motion in the image 
that is close to the current global motion estimate influences the 
subsequent motion estimates more effectively than does motion estimates 
that are very far from the current estimate, i.e., the iterative technique 
"locks on" to the motion that is close to the estimate. In the present 
10 invention, this locking property is augmented by the active rejection of 
local match surfaces whose shape is inconsistent with the hypothesis that 
there is a local minimum of the error (or maximum of the match) in the 
neighborhood of the current estimate. 



15 the system selects only those local match surfaces that meet a specific 
criteria to be used in the regression process. As such, once all the match 
surfaces are computed in step 206 of FIG. 2, that step may optionally 
prune the number of match surfaces used in the regression process using 
criteria such as the gradient (slope) of the surface, the surface peak, the 

20 surface shape, and the like. Such selection will remove surfaces that are, 
for example, flat and prone to be noisy. 



local match-measure. The normalized-correlation function is represented 



where the summations are performed for (ij)e W , and fw and gw denote 
the mean brightness value within corresponding windows around pixel 
(x,y) in / and pixel (x + u,;y + v) in g, respectively. Normalized correlation 
30 is widely applicable, since it is invariant to local changes in mean and 
contrast, i.e., when the images / and g are linearly related within the 
window W. Since the windows W are usually chosen to be small, global 



This selective use of the match surfaces can be extended such that 



In a specific embodiment of the invention, the generalized process 
described above can be used with a normalized-correlation function as the 



25 as: 



^(f(x + i 9 y+j)-f w )-{g(x + u + i,y + y + j)-g w ) 
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alignment based on local normalized correlation is invariant to a variety of 
non-linear global brightness variations between the two images / and g. 
The global correlation-based alignment process therefore combines the 
robustness and accuracy of global estimation with the broad applicability 
5 and large capture range of normalized correlation. 

In the present embodiment of the invention, the 
normalized-correlation is estimated using 9x9 correlation windows 
applied to Laplacian pyramid images. The quadratic approximations are 
based on 3 x 3 Beaudet's masks (i.e., d- 1). The global correlation-based 

10 alignment process of the present invention has been applied to many 
image pairs, consisting of a variety of types of scenes and motions 
including pairs of images containing actual image motion, external 
camera motion, and internal camera motion, e.g., camera zoom. The 
process of the present invention registered the images with superior 

15 results over the prior art flow field- and gradient-based registration ' 
techniques. 

Due to the robust nature of the present invention, it can be used to 
register imagery produced by sensors of varying types, i.e., sensors having 
differing modalities such as an image from a visible camera and an 

20 image from an infrared camera. In an alternative embodiment of the 
invention, the global correlation-based alignment method presented above 
is applied to registering multi-sensor images. The adaptation of the 
present invention to multi-sensor images is based on a number of 
observations regarding multi-sensor imagery. First, when the images / 

25 and g are imaged by sensors of different modalities, the relationship 
between the brightness values of corresponding pixels is complex and 
unknown. Contrast reversal may occur in some parts of the images, 
while not in others. Furthermore, visual features present in one sensor 
image may not appear in the other image, and vice versa. Moreover, 

30 multiple brightness values in / may map to a single brightness value in 
g, and vice versa. In other words, the two multi-sensor images are 
usually not globally correlated, and often not even statistically correlated. 
Locally, however, (i.e., within small image patches), it is realistic to 
assume that the two images are correlated, up to a sign reversal. 
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Since a multi-resolution (coarse-to-fine) search is used in the 
present invention to handle large misalignments between the two images, 
it is important that the local match measure m(u, v;jt,y) be applicable at all 
resolution levels (i.e., applicable to low-pass or band-pass filtered images). 
5 When the two images are obtained by the same sensor, or by two different 
cameras of same modality, then typically the corresponding signals at all 
resolution levels are correlated. In a multi-sensor image pair, however, 
the signals are correlated primarily in high resolution levels, where the 
details that correspond to the physical structure in the scene are captured. 

10 Low-pass filtered multi-sensor images are usually not correlated, as low 
resolution features depend heavily on the photometric and physical 
imaging properties of the sensors, which can differ substantially between 
the two images. To apply coarse-to-fine search to multi-sensor images, it 
is therefore necessary to project high-resolution scene information into 

15 low-resolution levels of the pyramid. 

Consequently, these observations require the foregoing method of 
FIG. 2 to be slightly altered to produce a multi-sensor registration process. 
FIG. 3 depicts a block diagram of the multi-sensor registration routine 
300. In routine 300, the images are pre-processed prior to being subjected 

20 to the global alignment routine 116 of FIG. 2. Specifically, the routine 300 
begins at step 302 and proceeds to step 304 to accomplish image 
preprocessing. The images are pre-filtered to assure that when the 
pyramid is constructed, high-resolution information will in fact be 
projected into low resolution levels. In particular, in step 306, each of the 

25 two multi-sensor images is first high-pass filtered using a Laplacian 
filter, then, at step 306, the filtered images are squared. The squaring 
process assures that high resolution features appear in low resolution 
levels. Processing to achieve high resolution feature shifting is disclosed 
in P.J. Burt, "Smart Sensing With a Pyramid Vision Machine, Proceeding 

30 of the IEEE, 76:1006-1015 (1988). Since the Laplacian-energy image is 
invariant to contrast reversal in the original image, a local match 
measure based on normalized-correlation is adequate for this kind of 
imagery. 
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At step 310, the global alignment routine 116 of FIG. 2 is applied 
with a normalized-correlation local match measure to the Laplacian- 
energy images (the preprocessed images) rather than to the original 
images f and g. 

5 Further increase in the sensitivity of the local match measure can 

be achieved by using directional-derivative energy images instead of the 
Laplacian-energy images. The global correlation-based registration was 
extended to apply simultaneously to a collection of n multi-sensor pairs of 
directional-derivative-energy images. The extension is performed by 

10 estimating a single parametric transformation p, which simultaneously 
maximizes the local normalized-correlations of n image pairs: 

= Ic^SM^C^^pXv^yip)) (8) 

where S} x ' y) is the match-measure surface estimated for pixel (x,y) in the 
i-th image pair. In particular, this can be applied to n corresponding 1 

15 pairs of directional-derivative-energy images. 

In practice, the invention has be used with four (n=4) directional 
derivatives (horizontal, vertical, and the two diagonals). Applying the 
global correlation-based alignment simultaneously to the 4 corresponding 
pairs provides the global alignment with increased robustness, which 

20 allows for successful multi-sensor registration even in very challenging 
situations, i.e., when there are significant differences in image content 
between the two sensor-images. Apart from having significantly different 
appearance, there can be many non-common features in the multi-sensor 
image pair, which can lead to false matches. These are overcome by the 

25 automatic outlier mechanism of the present invention. 

Although various embodiments which incorporate the teachings of 
the present invention have been shown and described in detail herein, 
those skilled in the art can readily devise many other varied embodiments 
that still incorporate these teachings. 

30 
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What is claimed is: 

1. A method for aligning a first image with a second image comprising 
the steps of: 

5 determining a plurality of local match surfaces; and 

performing a global regression across the plurality of local match 
surfaces to achieve alignment of the images without generating a local 
flow estimation. 

10 2. The method of claim 1 wherein each of the local match surfaces in said 
plurality of local match surfaces is determined using a function selected 
from one of the following functions: correlation, normalized correlation, 
and squared or absolute brightness difference. 

f 

15 3. The method of claim 1 wherein global regression computes a 
parametric transformation as 

where: 

X is a matrix defined in u{x,y\p) = X(x,y)- p); 
20 {5 u,y) }is a local match-measure surface at location (x,y); 

V u 5(w)is a gradient of 5 Uo ' ) «(x ) y;p); and 
H s is a Hessian of S ix,y) u(x,y\p),; 

u Q = u{x,y\p 0 ) is a displacement induced at pixel (x y y) by an 

estimated parametric transformation p Q during a previous 
25 iteration. 

4. The method of claim 3 wherein said parametric transformation is 
iterated until the parametric transformation is deemed sufficient to align 
the first and second images. 

30 

5. The method of claim 4 wherein after each iteration the first and second 
images are warped into alignment using the parametric transformation 
determined during a previous iteration. 
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6. The method of claim 1 further comprising a step of pyramid processing 
each of said first and second images to decompose said first and second 
images into first and second image pyramids, where said global 

5 regression is performed on an image pyramid level by image pyramid 
level basis. 

7. The method of claim 1 further comprising the steps of: 

high pass filtering said first and second images; and 
10 squaring the high pass filtered first and second images. 

8. A method for aligning a first image with a second image comprising 
the steps of: 

(a) decomposing said first and second images respectively into first 
15 and second image pyramids; 

(b) selecting a pyramid level in each of said first and second image 
pyramids; 

(c) determining a plurality of local match surfaces within said 
selected pyramid level; 

20 (d) determining a parametric transformation that aligns the 

selected pyramid level of said first and second images using the following 

global regression expression S p = -(J^g • H s (u 0 ) • X)~ ] - (J^jg . V.S(5 0 )); 
where: 

X is a matrix defined in u(x 9 y\p) = X(jr,y)-p); 
25 {S u ' y) }is a local match-measure surface at location (x,y); 

V u 5(w)is a gradient of S ix ' y) u(x,y;p); and 
H s is a Hessian of S {x > y) u(x,y\p) 9 ; 

u Q = u(x t y\p 0 ) is a displacement induced at pixel (x,y) by an 

estimated parametric transformation p Q during a previous 
30 iteration. 

(e) repeating steps (c) and (d) until said parametric transformation 
meets a predefined criterion; 
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(f) repeating steps (b), (c), (d) and (e) until all said pyramid levels in 
said first and second image pyramids are aligned. 

9. The method of claim 8 wherein each of the local match surfaces in the 
5 plurality of local match surface is determined using a function selected 
from one of the following functions: correlation, normalized correlation, 
and squared or absolute brightness difference. 



10. The method of claim 8 wherein step (e) further comprising warping 
10 said first and second images into alignment using the parametric 

transformation and setting the parametric transformation to zero. 

11. The method of claim 8 further comprising the steps of: 

high pass filtering said first and second images; and 
15 squaring the high pass filtered first and second images. 
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